Particle Swarm Optimization for clustering short-text corpora
نویسندگان
چکیده
Clustering of short-text collections is a very relevant research area, given the current and future mode for people to use “small-language” (e.g. blogs, snippets, news and text-message generation such as email or chat). In recent years, a few approaches based on Particle Swarm Optimization (PSO) have been proposed to solve document clustering problems. However, the particularities that arise when this kind of approaches are used for clustering corpora containing very short documents have not received too much attention by the computational linguistic community, maybe due to the high challenge that this problem implies. In this work, we propose some variants of PSO methods to deal with this kind of corpora. Our proposal includes two very different approaches to the clustering problem, which essentially differ in the representations used for maintaining the information about the clusterings under consideration. In our approach, we used two unsupervised measures of cluster validity to be optimized: the Expected Density Measure ρ̄ and the Global Silhouette coefficient. In recent works on short-text clustering, these measures have shown an interesting correlation level with the “true” categorizations provided by a human expert. The experimental results show that PSO-based approaches can be highly competitive alternatives for clustering short-text corpora and can, in some cases, outperform the performance of the most effective clustering algorithms used in this area.
منابع مشابه
A Particle Swarm Optimizer to cluster short-text corpora: a performance study
Short-text clustering is currently an important research area because of its applicability to web information retrieval, text generation and text mining. Some previous works have demonstrated the effectiveness of a discrete Particle Swarm Optimizer algorithm, named CLUDIPSO, for clustering corpora containing very short documents. In these studies, CLUDIPSO was evaluated with small collections a...
متن کاملPerformance analysis of Particle Swarm Optimization applied to unsupervised categorization of short texts Análisis de Prestación de Particle Swarm Optimization aplicado a Categorización no Supervisada de Textos Cortos
Nowadays there is a need to access to on line information such as abstracts, news, opinions, evaluations of products, etc. That information is generally available on the web as short texts. Previous works have demonstrated the effectiveness of a discrete Particle Swarm Optimization algorithm, named CLUDIPSO, for clustering small short-text corpora. This article presents a preliminary study abou...
متن کاملA Discrete Particle Swarm Optimizer for Clustering Short-text Corpora
Work on “short-text clustering” is relevant, particularly if we consider the current/future mode for people to use ‘small-language’, e.g. blogs, text-messaging, snippets, etc. Potential applications in different areas of natural language processing may include re-ranking of snippets in information retrieval, and automatic clustering of scientific texts available on the Web. Despite its relevanc...
متن کاملA Particle Swarm Optimizer to Cluster Parallel Spanish-English Short-text Corpora Un Optimizador basado en Cúmulo de Part́ıculas para el Agrupamiento de Textos Cortos de Colecciones Paralelas en Español-Inglés
Short-texts clustering is currently an important research area because of its applicability to web information retrieval, text summarization and text mining. These texts are often available in different languages and parallel multilingual corpora. Some previous works have demonstrated the effectiveness of a discrete Particle Swarm Optimizer algorithm, named CLUDIPSO, for clustering monolingual ...
متن کاملFuzzy Particle Swarm Optimization Algorithm for a Supplier Clustering Problem
This paper presents a fuzzy decision-making approach to deal with a clustering supplier problem in a supply chain system. During recent years, determining suitable suppliers in the supply chain has become a key strategic consideration. However, the nature of these decisions is usually complex and unstructured. In general, many quantitative and qualitative factors, such as quality, price, and fl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009